{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 目标窗口检测算法-NMS非极大值抑制 #"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "YOLO在最后的一个步骤就是对 $ S*S*(B*5+C) $ 个向量进行非极大值抑制（Non-max suppression），一开始不是太明白非极大值抑制是如何操作的，也就是不太清楚YOLO最后做完卷积后如何对求得向量进行预测，求得目标框框位置。\n",
    "\n",
    "对YOLO代码分析完之后对其他步骤操作有了一个大致的认识之后，回顾最后一步非极大值抑制，发现非极大值抑制在R-CNN、Fast-RCNN都有用到的同样的概念，因此YOLO的论文并没有提到如何进行非极大值抑制。\n",
    "\n",
    "其实在物体检测领域当中，非极大值抑制应用十分广泛，目的是为了消除多余的框，找到最佳的物体检测的位置。那么具体如何操作呢？如下图所示，有三个boundingbox，其中第一个绿色boundingbox的置信度是0.7，第二个绿色boundingbox的置信度是0.6，第三个绿色boundingbox的置信度是0.7。非极大值抑制就是在这三个框当中选出置信度最高，且最有可能代表是目标的boundingbox。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](img/monroe0.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了研究透NMS到底是如何操作的，我来随机定义了4个窗口，置信度分别是[0.5, 0.7, 0.6, 0.7]，对应上图的多了一个方框。算法的核心：\n",
    "\n",
    "1. 把置信度最高的一个boundingbox(bbox)作为目标，然后对比剩下bbox与目标bbox之间的交叉区域\n",
    "2. 如果交叉区域大于设定的阈值，那么在剩下的bbox中去除该bbox（即使该bbox的置信度与目标bbox的置信度一样）----这个操作就是抑制最大重叠区域\n",
    "3. 把第二置信度高的bbox作为目标，重复1、2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "dets = np.array([\n",
    "                [204, 102, 358, 250, 0.5],\n",
    "                [257, 118, 380, 250, 0.7],\n",
    "                [280, 135, 400, 250, 0.6],\n",
    "                [255, 118, 360, 235, 0.7]])\n",
    "\n",
    "thresh = 0.3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3]\n"
     ]
    }
   ],
   "source": [
    "def nms(dets, thresh):\n",
    "    x1 = dets[:, 0]\n",
    "    y1 = dets[:, 1]\n",
    "    x2 = dets[:, 2]\n",
    "    y2 = dets[:, 3]\n",
    "    scores = dets[:, 4]\n",
    "    \n",
    "    areas = (x2 - x1 + 1) * (y2 - y1 + 1) # 每个boundingbox的面积\n",
    "    order = scores.argsort()[::-1] # boundingbox的置信度排序\n",
    "\n",
    "    keep = [] # 用来保存最后留下来的boundingbox\n",
    "    while order.size > 0:     \n",
    "        i = order[0] # 置信度最高的boundingbox的index\n",
    "        keep.append(i) # 添加本次置信度最高的boundingbox的index\n",
    "        \n",
    "        # 当前bbox和剩下bbox之间的交叉区域\n",
    "        # 选择大于x1,y1和小于x2,y2的区域\n",
    "        xx1 = np.maximum(x1[i], x1[order[1:]])\n",
    "        yy1 = np.maximum(y1[i], y1[order[1:]])\n",
    "        xx2 = np.minimum(x2[i], x2[order[1:]])\n",
    "        yy2 = np.minimum(y2[i], y2[order[1:]])\n",
    "        \n",
    "        # 当前bbox和其他剩下bbox之间交叉区域的面积\n",
    "        w = np.maximum(0.0, xx2 - xx1 + 1)\n",
    "        h = np.maximum(0.0, yy2 - yy1 + 1)\n",
    "        inter = w * h\n",
    "        \n",
    "        # 交叉区域面积 / (bbox + 某区域面积 - 交叉区域面积)\n",
    "        ovr = inter / (areas[i] + areas[order[1:]] - inter)\n",
    "\n",
    "        #保留交集小于一定阈值的boundingbox\n",
    "        inds = np.where(ovr <= thresh)[0]\n",
    "        order = order[inds + 1]\n",
    "        \n",
    "    return keep\n",
    "\n",
    "print nms(dets, thresh)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后的效果就是得到开始定义的4个bbox中的第4个(3):\n",
    "\n",
    "![](img/monroe1.jpg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [Root]",
   "language": "python",
   "name": "Python [Root]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
